Search CORE

77 research outputs found

Consistency of Feature Markov Processes

Author: Hutter Marcus
Sunehag Peter
Publication venue
Publication date: 01/01/2010
Field of study

We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will provably eventually only choose between alternatives that satisfy an optimality property related to the used criterion. We extend our work to the case where there is side information that one can take advantage of and, furthermore, we briefly discuss the active setting where an agent takes actions to achieve desirable outcomes.Comment: 16 LaTeX page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Coding of non-stationary sources as a foundation for detecting change points and outliers in binary time-series

Author: Hutter Marcus
Shao Wen
Sunehag Peter
Publication venue: Australian Computer Society
Publication date: 01/12/2012
Field of study

An interesting scheme for estimating and adapting distributions in real-time for non-stationary data has recently been the focus of study for several different tasks relating to time series and data mining, namely change point detection, outlier detection and online compression/sequence prediction. An appealing feature is that unlike more sophisticated procedures, it is as fast as the related stationary procedures which are simply modified through discounting or windowing. The discount scheme makes older observations lose their influence on new predictions. The authors of this article recently used a discount scheme for introducing an adaptive version of the Context Tree Weighting compression algorithm. The mentioned change point and outlier detection methods rely on the changing compression ratio of an online compression algorithm. Here we are beginning to provide theoretical foundations for the use of these adaptive estimation procedures that have already shown practical promise

The Australian National University

The Sample-Complexity of General Reinforcement Learning

Author: Hutter Marcus
Lattimore Tor
Sunehag Peter
Publication venue
Publication date: 01/06/2013
Field of study

We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models. The algorithm is shown to be near-optimal for all but O(N log^2 N) time-steps with high probability. Infinite classes are also considered where we show that compactness is a key criterion for determining the existence of uniform sample-complexity bounds. A matching lower bound is given for the finite case.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Author: Hutter Marcus
Lattimore Tor
Sunehag Peter
Publication venue
Publication date: 29/06/2013
Field of study

Bayesian sequence prediction is a simple technique for predicting future symbols sampled from an unknown measure on infinite sequences over a countable alphabet. While strong bounds on the expected cumulative error are known, there are only limited results on the distribution of this error. We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence. We also consider the problem of constructing upper confidence bounds on the KL and Hellinger errors similar to those constructed from Hoeffding-like bounds in the i.i.d. case. The new results are applied to show that Bayesian sequence prediction can be used in the Knows What It Knows (KWIK) framework with bounds that match the state-of-the-art.Comment: 17 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Recommended from our members

A dual process theory of optimistic cognition

Author: Hutter Marcus
Sunehag Peter
Publication venue: The Cognitive Science Society
Publication date: 01/01/2014
Field of study

Optimism is a prevalent bias in human cognition including variations like self-serving beliefs, illusions of control and overly positive views of one's own future. Further, optimism has been linked with both success and happiness. In fact, it has been described as a part of human mental well-being which has otherwise been assumed to be about being connected to reality. In reality, only people suffering from depression are realistic. Here we study a formalization of optimism within a dual process framework and study its usefulness beyond human needs in a way that also applies to artificial reinforcement learning agents. Optimism enables systematic exploration which is essential in an (partially) unknown world. The key property of an optimistic hypothesis is that if it is not contradicted when one acts greedily with respect to it, then one is well rewarded even if it is wrong

eScholarship - University of California

The Australian National University

Principles of Solomonoff Induction and AIXI

Author: Hutter Marcus
Sunehag Peter
Publication venue
Publication date: 01/01/2011
Field of study

We identify principles characterizing Solomonoff Induction by demands on an agent's external behaviour. Key concepts are rationality, computability, indifference and time consistency. Furthermore, we discuss extensions to the full AI case to derive AIXI.Comment: 14 LaTeX page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Axioms for Rational Reinforcement Learning

Author: Hutter Marcus
Sunehag Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2016
Field of study

We provide a formal, simple and intuitive theory of rational decision making including sequential decisions that affect the environment. The theory has a geometric flavor, which makes the arguments easy to visualize and understand. Our theory is for complete decision makers, which means that they have a complete set of preferences. Our main result shows that a complete rational decision maker implicitly has a probabilistic model of the environment. We have a countable version of this result that brings light on the issue of countable vs finite additivity by showing how it depends on the geometry of the space which we have preferences over. This is achieved through fruitfully connecting rationality with the Hahn-Banach Theorem. The theory presented here can be viewed as a formalization and extension of the betting odds approach to probability of Ramsey and De Finetti [Ram31, deF37]

The Australian National University

Feature reinforcement learning: state of the art

Author: Daswani Mayank
Hutter Marcus
Sunehag Peter
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 28/07/2014
Field of study

Feature reinforcement learning was introduced five years ago as a principled and practical approach to history-based learning. This paper examines the progress since its inception. We now have both model-based and model-free cost functions, most recently extended to the function approximation setting. Our current work is geared towards playing ATARI games using imitation learning, where we use Feature RL as a feature selection method for high-dimensional domains

The Australian National University